AITopics

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Neural Information Processing SystemsApr-24-2026, 21:49:07 GMT

1a82986c9f321217f2ed407a14dcfa0b-Supplemental-Conference.pdf

artificial intelligence, assumption 1, machine learning, (18 more...)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.46)

Neural Information Processing SystemsFeb-16-2026, 20:30:32 GMT

Multi-language Diversity Benefits Autoformalization

Autoformalization is the task of translating natural language materials into machine-verifiable formalisations. Progress in autoformalization research is hindered by the lack of a sizeable dataset consisting of informal-formal pairs expressing the same essence.

large language model, logic & formal reasoning, machine learning, (21 more...)

Country:

Europe > Austria > Vienna (0.14)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.14)
North America > United States > Louisiana > Orleans Parish > New Orleans (0.04)
(7 more...)

Genre: Research Report > Experimental Study (1.00)

Industry: Information Technology (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.95)
Information Technology > Artificial Intelligence > Representation & Reasoning > Logic & Formal Reasoning (0.95)
Information Technology > Artificial Intelligence > Natural Language > Machine Translation (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.69)

arXiv.org Artificial IntelligenceDec-2-2025

Spark-Prover-X1: Formal Theorem Proving Through Diverse Data Training

Zhou, Xinyuan, Lei, Yi, Zhou, Xiaoyu, Sun, Jingyi, Zhu, Yu, Ye, Zhongyi, Zhang, Weitai, Liu, Quan, Wei, Si, Liu, Cong

Large Language Models (LLMs) have shown significant promise in automated theorem proving, yet progress is often constrained by the scarcity of diverse and high-quality formal language data. To address this issue, we introduce Spark-Prover-X1, a 7B parameter model trained via an three-stage framework designed to unlock the reasoning potential of more accessible and moderately-sized LLMs. The first stage infuses deep knowledge through continuous pre-training on a broad mathematical corpus, enhanced by a suite of novel data tasks. Key innovation is a "CoT-augmented state prediction" task to achieve fine-grained reasoning. The second stage employs Supervised Fine-tuning (SFT) within an expert iteration loop to specialize both the Spark-Prover-X1-7B and Spark-Formalizer-X1-7B models. Finally, a targeted round of Group Relative Policy Optimization (GRPO) is applied to sharpen the prover's capabilities on the most challenging problems. To facilitate robust evaluation, particularly on problems from real-world examinations, we also introduce ExamFormal-Bench, a new benchmark dataset of 402 formal problems. Experimental results demonstrate that Spark-Prover achieves state-of-the-art performance among similarly-sized open-source models within the "Whole-Proof Generation" paradigm. It shows exceptional performance on difficult competition benchmarks, notably solving 27 problems on PutnamBench (pass@32) and achieving 24.0\% on CombiBench (pass@32). Our work validates that this diverse training data and progressively refined training pipeline provides an effective path for enhancing the formal reasoning capabilities of lightweight LLMs. We will release both Spark-Prover-X1-7B and Spark-Formalizer-X1-7B, along with the ExamFormal-Bench dataset, in the near future.

arxiv preprint arxiv, large language model, machine learning, (17 more...)

2511.13043

Genre: Research Report > New Finding (0.34)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Logic & Formal Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

arXiv.org Artificial IntelligenceNov-25-2025

REAL-Prover: Retrieval Augmented Lean Prover for Mathematical Reasoning

Shen, Ziju, Huang, Naohao, Yang, Fanyi, Wang, Yutong, Gao, Guoxiong, Xu, Tianyi, Jiang, Jiedong, He, Wanyi, Yang, Pu, Sun, Mengzhou, Ju, Haocheng, Wu, Peihao, Dai, Bryan, Dong, Bin

Nowadays, formal theorem provers have made monumental progress on high-school and competition-level mathematics, but few of them generalize to more advanced mathematics. In this paper, we present REAL-Prover, a new open-source stepwise theorem prover for Lean 4 to push this boundary. This prover, based on our fine-tuned large language model (REAL-Prover-v1) and integrated with a retrieval system (Leansearch-PS), notably boosts performance on solving college-level mathematics problems. To train REAL-Prover-v1, we developed HERALD-AF, a data extraction pipeline that converts natural language math problems into formal statements, and a new open-source Lean 4 interactive environment (Jixia-interactive) to facilitate synthesis data collection. In our experiments, our prover using only supervised fine-tune achieves competitive results with a 23.7% success rate (Pass@64) on the ProofNet dataset-comparable to state-of-the-art (SOTA) models. To further evaluate our approach, we introduce FATE-M, a new benchmark focused on algebraic problems, where our prover achieves a SOTA success rate of 56.7% (Pass@64).

large language model, logic & formal reasoning, machine learning, (19 more...)

2505.20613

Country:

Asia (0.46)
North America > United States (0.14)

Genre: Research Report > New Finding (0.48)

Industry: Education > Educational Setting (0.88)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Logic & Formal Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.71)

Neural Information Processing SystemsNov-20-2025, 03:17:36 GMT

Lean Workbook: A large-scale Lean problem set formalized from natural language math problems

I do believe that problems are the heart of mathematics.

artificial intelligence, large language model, machine learning, (16 more...)

Country: Asia > China > Shanghai > Shanghai (0.05)

Genre:

Research Report (0.68)
Instructional Material > Course Syllabus & Notes (0.50)

Industry: Education (0.68)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.96)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

arXiv.org Artificial IntelligenceOct-21-2025

Lean Finder: Semantic Search for Mathlib That Understands User Intents

Lu, Jialin, Emond, Kye, Yang, Kaiyu, Chaudhuri, Swarat, Sun, Weiran, Chen, Wuyang

We present Lean Finder, a semantic search engine for Lean and mathlib that understands and aligns with the intents of mathematicians. We further align Lean Finder with mathematicians' preferences using In addition, Lean Finder is compatible with LLM-based theorem provers, bridging retrieval with formal reasoning. Advances in Lean and mathlib (De Moura et al., 2015; Moura & Ullrich, 2021) are turning mathematical discovery into a collaborative and verifiable research workflow. Despite these advances, state-of-the-art LLMs still cannot solve math research problems. Lean's syn tax, gram mar, and tac tics in cur a steep learn ing curve. All experiments and data processing were conducted outside Meta. Figure 1: In the evaluation with user queries, real users preferred Lean Finder in 81.6% of cases, compared with Consider the two queries below. Lean search engines handle (Gao et al., 2024a;b; Ju & Dong, 2025; Asher, 2025): Denote L/K a field extension, x, y in L are algebraic elements over K with the same minimal polynomial. I'm working with algebraic elements over a field extension and I have two elements, say x and y in L. I know x is algebraic over K, and I've shown that y is a root of the minimal polynomial of x. Does this imply that the minimal polynomials of x and y are actually equal? T arget Statement 2: 1 theorem eq_of_root {x y: L} (hx: IsAlgebraic K x) (h_ev: Polynomial.aeval y (minpoly K x) = 0): minpoly K y = minpoly K x):= -- proof omitted for brevity This user latent (motivation, perspective, abstraction) cannot be inferred or encoded by a purely syntactic informalization. Addressing this challenge calls for Lean search engines that can understand a mathematician's intent, not merely We defer a more rigorous analysis in Section 2.2, and ask our core question: Our approach analyzes and clusters public discussions, then synthesizes queries that simulate user intents (Section 3.1).

large language model, logic & formal reasoning, machine learning, (22 more...)

2510.1594

Genre: Research Report > New Finding (0.92)

Industry:

Information Technology (1.00)
Law (0.92)
Government (0.92)

Technology:

Information Technology > Information Management > Search (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Logic & Formal Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
(2 more...)

Sivakumar, Jasivan Alex, Borchert, Philipp, Cardenas, Ronald, Lampouras, Gerasimos

Conjecturing: An Overlooked Step in Formal Mathematical Reasoning

arXiv.org Artificial IntelligenceOct-15-2025

Autoformalisation, the task of expressing informal mathematical statements in formal language, is often viewed as a direct translation process. Many mathematical problems cannot be formalised directly without first conjecturing a conclusion such as an explicit answer, or a specific bound. Since Large Language Models (LLMs) already struggle with autoformalisation, and the evaluation of their conjecturing ability is limited and often entangled within autoformalisation or proof, it is particularly challenging to understand its effect. To address this gap, we augment existing datasets to create ConjectureBench, and redesign the evaluation framework and metric specifically to measure the conjecturing capabilities of LLMs both as a distinct task and within the autoformalisation pipeline. However, the conjecture should not be assumed to be provided. We demonstrate that while LLMs possess the requisite knowledge to generate accurate conjectures, improving autoformali-sation performance requires treating conjecturing as an independent task, and investigating further how to correctly integrate it within autoformalisation. Finally, we provide forward-looking guidance to steer future research toward improving conjecturing, an overlooked step of formal mathematical reasoning. Natural language reasoning with Large Language Models (LLMs) has emerged as a powerful tool for solving complex mathematical problems. Its effectiveness is highlighted by recent breakthroughs, such as AI systems from OpenAI and Google solving five of six problems from the 2025 International Mathematics Olympiad (IMO) using natural language (Metz, 2025). The critical caveat is that these informal solutions require validation by expert mathematicians, a process that is prone to human error and lack scalability (Gou ezel & Shchur, 2019). Proof assistants like Isabelle (Wenzel et al., 2008) and Lean (Moura & Ullrich, 2021) provide a path toward automated verification at scale through formal reasoning.

conjecture, large language model, machine learning, (19 more...)

2510.11986

Country: Europe (0.28)

Genre: Research Report (0.83)

Industry: Leisure & Entertainment > Sports (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.90)

Li, Yupei, Borchert, Philipp, Lampouras, Gerasimos

TopoAlign: A Framework for Aligning Code to Math via Topological Decomposition

arXiv.org Artificial IntelligenceOct-15-2025

Large Language Models (LLMs) excel at both informal and formal (e.g. Lean 4) mathematical reasoning but still struggle with autoformalisation, the task of transforming informal into formal mathematical statements. Autoformalisation helps pair the informal reasoning of LLMs with formal proof assistants which enable machine-verifiable generation and mitigate hallucinations. Yet, the performance of current Math LLMs is constrained by the scarcity of large-scale corpora, particularly those containing pairs of informal and formal statements. Although current models are trained to generate code from natural language instructions, structural and syntactic differences between these and formal mathematics limit effective transfer learning. We propose TopoAlign, a framework that unlocks widely available code repositories as training resources for Math LLMs. TopoAlign decomposes code into docstrings, main functions, and dependency functions, and reassembles these components into analogues that structurally mirror formal statements. This produces structurally aligned code data that can be used for training Math LLMs without requiring additional human annotation. We train two state-of-the-art models, DeepSeek-Math and Herald, and evaluate them on the minif2f, Putnam, and ProofNet benchmarks. TopoAlign provides substantial gains for DeepSeek-Math, improving performance by 17.77% on BEq@10 and 68.82% on typecheck@10. Despite introducing no new mathematical knowledge, our framework achieves gains of 0.12% and 1.09% for Herald on BEq@10 and typecheck@10, respectively, demonstrating that training on aligned code data is beneficial even for specialized models.

large language model, machine learning, natural language, (21 more...)